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ABSTRACT 


F2C 

programs in ANSI 
C language. The 
processing tools 
programs which 
Programming in a 
a minimum. 


is a source code converter which translates 
standard FORTRAN?? to equivalent programs in the 
converter is built using general purpose language 
which take high level specifications and generate 
can be combined to produce the converter, 
traditional programming language has been kept to 
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1. TO BEGIN WITH .. 


Source code converters convert programs written in one 
programming language to equivalent programs in another 
programming language. 

Why should one think of converting programs from one 
language to another? 

We see that in the past thirty years, there has been an 
explosive growth in the use of computers and a large amount of 
software has been developed, tested and put into commercial use. 
This software has been developed on many different hardware 
platforms using widely differing operating system environments and 
using diverse programming languages. With the growth in 

technology, there arise increased performance expectations and 
moving over to newer and better hardware systems becomes 
inevitable. But what is to be done with the existing programs 
developed on older systems and already thoroughly tested and 
proven? Cll 

One would hope that these programs would run equally 
well on the newer systems too. This would be the expectation at 
least for those programs written in well known and widely used 
high level languages like, FORTRAN or COBOL . 

But unfortunately, even the •'standard"' languages 
supported by ir»ost of the vendors come with each vendor providing 
his own nonstandard "'extensions'' and •'enhancements'' and 
"'improvements^ resulting in a proliferation of widely differing 
dialects of the same language. Over fifty dialects of COBOL are 
said to be extant! At least a dozen versions of FORTRAN exist! 

Rewriting the programs entirely in the new dialect 
becomes prohibitively costly. So programs are ported to new 
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dialects and / or environments. It would not be an exagger at i on 
to say that a significant portion of software development activity 
in the industry involves "'fixing-' old software onto new systems. 

Manual translation of code from one dialect to another 
has many undesirable aspects to it: 

1. The costs associated with software Ibr. 

2. The costs related to testing every single 
program so translated. 

3. The loss due to "'down time^ of software during 
translat ion . 

The reasons mentioned above argue definitively for 
automating source code conversion. Automatic converters once 
tested, eliminate the need to test the programs translated by 
them. 

Many automatic converters for translating one dialect of 
COBOL to another dialect of the same language have been developed 
in the industry. 

1.1 Different dialects ... Fine. But different languages? 

There are other situations very different from those 
described above, where conversion of code from one language to 
not a different dialect of the same language - but a completely 
different language becomes very desirable. 

Consider the Ada Language sponsored by the Defense 
Department of the United States of America. The Defense 
Department, which also happens to be one of the biggest software 
customers in U.S., insists that all software developed for it be 
written in the Ada Language, This has naturally led to a 
widespread usage of Ada and now extensive language support tools 
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such as debuggers etc., are available for Ada. Because of this, 
not only the Defense Department but also other customers who have 
Ada platforms prefer Ada to other languages. So those companies 
which have developed applications in other languages now want to 
have their programs converted into Ada programs. Presently, many 
automatic converters to convert programs frorri other major 
programming languages to Ada have entered the U.S. market. 

Consider another situation where, a FORTRAN programmer 
who wants to shift over to using the C language for writing 
programs so that he can make use of the better control constructs 
and data structure supports it provides. If he has already 
written many applications using FORTRAN he would not want to dump 
them. And he would continue to use FORTRAN and build more 
applications. This would prevent him froiir ever shifting over to 
C. A FORTRAN to C converter could help make this transition 
smooth 1 y . 

1.2 My Thesis 

My thesis involves developing a converter from ANSI 
standard FORTRAN-?? C23 to the C C33 language C F2C ). 

With the rapid spread of the UNIX operating system in 
the last ten years it has become the de facto industry standard 
for operating systetn environments. Progr anrimers have found the 
UNIX prograiTiming environtrient extremely helpful in developing 
programs. UNIX 'has been written in C and perhaps because of that. 


the entire 

operating system 

environment has 

the 

C-philosophy, 

C-th ink i ng 

ingrained into it. 

To 

make use 

of 

UNIX and 

the 

innumerable 

pr ogranriming tools 

bui It 

on it in 

an 

effective 

and 

efficient way learning C is almost 

ind ispensab le 

m 

A programmer 

using any 

other language 

would 

find hirriself 

handicapped 

in 


integrating his programs with the tools available on the system to 
build useful applications quickly. No other language is as well 
supported as C and no other language blends into the environment 
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so smoothly. Hence a large number of UNIX users who programmed in 
other languages earlier want to shift to writing programs in C 
now. 

For a FORTRAN programmer, switching over to C is 
possible if and only if his’ existing FORTRAN programs can be saved 
for his use with the new programs he would be writing in C. The 
object codes of the two languages are not compatible because of 
differences in parameter passing mechanisms and data storing 
methods. Further, large quantities of standard software 
especially. Mathematical Software Libraries, - written in FORTRAN 
and compatible only with FORTRAN programs - have been available to 
a FORTRAN programmer for a long time now. The application areas 
of a traditional FORTRAN programmer have been such that without 
the support of these Mathematical Software Libraries, little 
useful programming can be done. Software support in these areas 
are not extensively available currently to a C programmer because 
of the different needs of a traditional C programmer. It would be 
necessary to build these Mathematical Software Libraries in C if 
switching over to C usage from FORTRAN is to be possible. 
Rewriting these huge libraries in C would entail enormous software 
development effort. 

With these factors considered, the need for an automatic 
source code converter from FORTRAN to C would be very obvious. 

Of the several versions of FORTRAN currently in use in 
the industry, I have chosen the ANSI standard F0RTRAN77 as the 
source language to be converted because : 

1. This standard has been in wide use for more than ten 
years now and all the FORTRAN compilers marketed in 
the last ten years support this standard. 

2. The differences if any between the different 
compilers are in those features not fixed by the 
ANSI standard, and these differences are small 



enough 50 that the converter needs to be enhanced ' 
only slightly to accommodate these extensions to the 
standards. 

1.3 Doing it with Tools 

My thesis is not limited to just developing a source 
code converter. It incorporates another important idear viz., to 
generate the converter from high level specifications of the 
source and target languages using tools for program generation. 

While it is certainly possible to write the entire 
converter program in a traditional prograinming language like C, 
the development costs would be enormous. To move from the 
prototype to the working version, to change specifications as the 
program develops, all these became very costly in terms of time 
and effort. Using program generating tools to build the converter 
helps not only to cut down these costs but also to concentrate on 
the design aspects better, so that we will be able to produce a 
converter that is likely to be more reliable. 

So the philosophy is - work with only specifications, 
leave it to the tools to generate programs. 



2. THE DESIGN 


2.1 Source Language Converters 

Language converters are programs very similar to 
compilers. For a coiripiler, the source language is a high level 
language and the target language is a machine language. In the 
case of a converter both the source and the target are likely to 
be high level languages. This means that the front end of both 
compilers and converters will be very similar. Both involve the 
analysis of the source language program and representing the 
source program in some form of intermtediate representation 
suitable for further processing. 

Therefore, we will be employing methods and program 
generating tools used in compiler construction for lexical and 
syntax analyses of the source language. 

The use of tools for generating lexical analysers, and 
parsers from high level specifications for these programs is well 
understood and routinely employed in the industry. However, the 
use of such tools for syntax tree construction, and further 
processing of these syntax trees is not as common as it should be. 
In the development of F2C an attempt has been made to minimise 
traditional programming and use language processing tools at all 
stages to generate converter from language specifications. 

2.2 The Structure of a Converter 

The organisation of a converter is as shown in the 
figure 2.1. It can be divided into five stages: 

1. Lexical Analysis: The source program is read by the 
lexical analyser which produces lexical units < tokens) 
received by the next stage. 
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C - ROUTINES 
FOR GENERATING 
DATA - OBJECT 
DEFINITIONS 


TAHQET PROQRAM 


Figure 2.1 Structure of a Source to Source 

Program Converter 
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2. Syntax Analysis: TIib stream of tokens produced by the 
lexical analyser is received by the parser which 
rectagnises the different syntactical constructs. At 
this stage we also do soitg semantic checking which 
mainly involves collecting various attributes of 
different identifiers and storing them in syrrbol table, 

3. Tree Building: This stage involves building syntax 

trees for the source prograiTi as and v,hen the different 
lexical constructs are recognised by ttie parser. So at 
the end of this stage the syntax tree built, together 
with the sytrbol table contains all the information about 
the input program necessary for making language 

conversion. 

4. Tree Transformation; At this stage the syntax tree 
built by the tree builder is transforired into the syntax 
tree of the target language. This is achieved by 
applying various transformations on different parts of 
the original syntax tree. It is at this stage that the 
actual conversion of source program to semantically 
equivalent target prograiii takes place. 

5. Unparsing the Syntax Tree: During this stage the 

converter ' unparses the transforired syntax tree now 
representing the original program in target language 
syntax. The information in the syntax tree along with 
that in the syirbal tables are used at this stage to 
generate the target program in sore forrri of internal 
representat ion. 

6. Formatting: Tlie output of the tniparser and other 

routines will be in a coded form containing formatting 
details and other infoririation. The formatter converts 
this into the actual target program. 
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.3 The Tool Kit, 

Many different language processing tools have been used 
or developing F2C. These are general purpose tools suited for 
'arious language processing applications. 

Lex C43 - The Lexical Analyser generator 
known tool on Unix. It takes 

specifications for tolens and produces 
capable of recognising these tokens. 

Yacc C53 - The Parser generator; This tool takes LALRdJ 
grammar spec i f icat ions and produces a parser capable of 
recognising inputs satisfying the gramiTiar specifications. 
This requires a routine to act as a lexical analyser. Such 
a routine is generally built using Lex and hence Yacc has 
been designed to work well in conjunction with Lex. 

Treegen C61 - The Tree Managers This tool is a new addition 
to the kit of a language processor builder. This tool is 
useful in all those situations where it is convenient to 
store information in the form of a tree and process this 
information by applying suitable transformat ions on the tree 
structure. The processed information can be retrieved later 
from the transformed tree. This tool has been used in F2C for 
building syntax trees, transforming them and for unparsing 
them. 

2.4 More about Treegen 

Since Treegen is a relatively new language processing 
tool, it is perhaps necessary here to give a more detailed 
description of the tool. 

Treegen is a tool which from a set of high level input 
specifications, generates routines to perform four different but 
related sets of tasks. 


This is a well 
regular expression 
a lexical analyser 
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1. Tree builder: Tltesie are routitjee winch help build a 
tree acccsrding to the NODE specifications given. 

2. Tree Transforirer: 'The'a^ routines help transforiri 

different portioris of the tree built according to a set 
of RU-Es given in the input specifications. 

3. Tree Unparser: These routines help to unparse the tree 
according to a set of unparse specifications. TIney also 
provide rreans to execute user defined functions at user 
specified points in the course of unparsing the tree. 

4. Formatter: Tlie output of the urtparser is in a coded 
form with formatting details and other information. 
Til is needs to be converted into user readable form. 
Tliis is, done by the formatter. This formatter is not 
actually generated by Treegen. However it is a program 
developed to work exclusively with Treegen. 

5. Syrrbol Table Handler: T1 k2e« 3 routines can be used in 
language processing applications to build syrrbol tables, 
resolve references to identifiers taking into 
consideration different kinds of scope rules. 

This part of the Treegen tool has not been used in 
building F2C because the rrethods for collecting 
attributes of syirbols and resolving references to 
symbols in F0RTRAN77 are very different from those of 
rrost other prograrrming languages and Treegeri does not 
have provisions to support the rather unusual needs of 
F0RTTVW7. 

2.5 The Structure of F2C 

Tire Figure 2.2 shows the corrplete structure of F2C with 
data flow information, the different tools used for generating 
various carponents of F2C and the type of input specifications 
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needed by these tools. It can be seen that except 
needed for symbol table management and those needed 
data object definitions from these symbol tables at 
tree unparsing all other parts of the converter are 
the help of the tools. This method has been one 
features of F2C. 


the routines 
for generating 
the time of 
generated with 
of the tTiB i n 


3. FROM IDEAS TO PROGRAMS 


In this chapter we shall consider some of the issues 
nvolved in implementing a FQRTRAN77(F77) to C converter. 

.1 Lexical and Syntax analysis # 

The lexical and syntax analysis of FORTRAN Cof all hues! 
s more difficult than those of many other languages because of 
,ome unusual features of FORTRAN. These problems are well known 
tnd do not require elaboration here. 

In brief, the lexical analysis is complicated by these 

■acts; 

1. Blanks, tabs and newl inesCwhitespace) can be embedded 
inside a single lexical unit(token). 

2. There may not always be a field separating character 

between two different tokens. 

3. Keywords are not reserved. One may use them as 

identifiers. 

Syntax analysis is made difficult because : 

1. Array element reference and function reference have 
identical syntax. 

2. The Statement Function Statement which is a component 

of the Specification Part has a syntax which could be 
that of an Assignment Statement occurring in the 

Execution Part. 
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A prablem specific to a converter like F2C and not 
Encountered by a compiler writer is this; 

Comments are to be preserved in the converted code and hence 
must be accommodated in the grammar. But comments can occur 
virtually anywhere in the program - even within a token 
because of continuation lines. 

3,2 A Typical F77 program 

The stages beyond parsing in the building of a converter 
from F77 to C may be explained best with the help of an actual 
F77 program. 


1 C 

2 C 

3 C 

4 

5 C 

6 

7 

8 
9 

10 

11 10 
12 

13 

14 

15 

16 

17 

18 


FUNCTION TO COMPUTE THE VECTOR SUM OF A 1~D ARRAY 
OF INTEGERS AND FIND IF IT IS ZERO. RETURNS .TRUE. 
IF VECTOR SUM IS ZERO ELSE RETURNS .FALSE. 

LOGICAL FUNCTION VSUMZERO ( VECT , NUMEL) 

VECT - ARRAY NAME. NUMEL - NO. OF ELEMENTS IN ARRAY 
INTEGER VECT, VSUM 

DIMENSION VECTCNUMEL) 

VSUM = 0 

DO 10 I = 1, NUMEL 

VSUM = VSUM + VECT (I) 

CONTINUE 

IF ( VSUM .EQ. 0) THEN 
VSUMZERO = .TRUE. 

ELSE 

VSUMZERO = .FALSE. 

ENDIF 

RETURN 

END 

Figure 3.1 A F77 Program 


Note: The numbers at the left extreme end denote line numbers 
and are not part of the program. 



15 


The figure above shows a typical F77 program unit. It 
is a function subprogram which computes the vector sum of a one 
dimensional array and returns that sum. 

3.3 Symbols and their attributes 

The lexical analyser puts the symibols in the symbol 
table. However, the attributes of these symbols are to be 
determined at the time of parsing and stored into the symbol 
table. In F77, the attributes of symbols are not obtained at a 
single point. They have to be collected by analysing all the 
specification statements. 

Consider, in the Figure 3.1 above, the attributes of 
symbol VECT. From line 4 (.function declaration stmt) we gather 
that it is a formal parameter to the function. Line 6 (tvpe 
declaration stmt) tells that it has type INTEGER. Line 7 ( 
dimension stmt ) informs that it is a single dimensioned array 
with the number of elements equal to the value . of formal 
parameter NUMEL. The attribute set of VECT is a collection of all 
these data. The attribute set of undeclared symbols ( e.g. NUMEL 
above) are to be found from their names by referring to implicit 
naming type information. 

3.4 Tree building 

The Treegen tool takes high level input specifications 
for the structure of different tvpes of nodes and the relationship 
between them and produces definition tables and routines which 
assist in syntax tree building. The type of a node is uniquely 
associated with the node name. 
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There are three different kinds of nodes which may be 

specified. 

A leaf node is a node which forms a leaf of the syntax tree, 
A list node has variable number of sons of the same type. 

A other node has fixed number of sons each of which may be of 
any type. List nodes and other nodes form the 
inner nodes of the syntax tree. 

A NULL node is a special type of leaf node and 
represents an empty subtree. 

A portion of the NODE spec i f icat ions relevant to 
building the syntax tree of a function subprogram is given below. 


NODE 

f unct iDn_5Ubprogr am :< 

sonl; apt_comment _5tmtB , 

5on2; f unct ian_s+ rrit__par t , 
sonS: {specification __part, NULL}, 
son4; f execut ion_par t , NULL} , 
5on5: end_stmt_par t 


The tree structure designated by the above specification 
is shown in Figure 3.2. The line numbers of the statements in 
Figure 3.1 which would correspond to the different subtrees are 
also shown. 


IT 



F i g ur e 3.2 


Graphical Representation of Sample 
Node Specification 


the children of the node 

terms of other 


The structure of 

function_subpru9r^. must bu specified further in 

^ in turn etc, till the entire tree is described 

nodes and they in turn ... eiu, 

in terms of leaf nodes. 

As a second eMample, the 

label _do__stjj>t would be : 


specifications for 


label do„5tmt 


sonl: f LABEL, NULL!, /« leaf «/ 
son2: DO, 

5on3: INTNUM, 

son4; {ar ith_loop_control, c_far _loop}, 

nunG: GTLND 


/n leaf «/ 
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ar i th_loap_contr ol 


: < 

sanl; opt_comrria, 

5an2; index, 
san3: / expr /, 
san4: / expr /, 

SQn5: £ / expr /, NULL! 


Thus the W<?Z?£ specifications describe a template af tree 
structure into which all possible syntax trees for the language 
can fit. 

The Treegen generates routines which assist in tree 
building. Some of these are; 

1. NODE nskenodeC int nodevame, int info)? 

2. NODE makeleaf ( int nodenaae, int info., char *aparse> ^ 

Here nodename is the user defined name for the node, 
which will be converted by Treegen into a macro defining an 
integer. The info field would be an integer representing any 
information the programmer may wish to store in the node. In a 
leaf node uparse represents a character string to be printed in 
the output while unparsing that leaf. 

For other routines the reader is referred to C61. 

The tree is built bottom up, starting with leaf nodes 
and proceeding to the root of the syntax tree. This tree building 
is done as a part of the actions in the parser as and when the 
parser recognises different components necessary for building 
nodes. 
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A sample of input to Yacc where actions contain call to 
the tree building routines is shown below. 


nonlabel__da__stmt : _LABEL _D0 logical _loop_cantral _STEND 

{ 

register int p; 

p = pop () ; logical_lDDp_control w/ 
makeleaf (LABEL, 11; 
make leaf (DO, 0, 0) ; 
push (pi ; 

makeleaf (STEND,0,01; 

iTi3kenode(nonlabel_lagical_dD_stmt ,01 ; 
makenode (nonlabel_dD_stmt ,01 ; 


3.5 Tree Transformation 

This is the part of the converter where conversion of 
F77 syntax tree to C language syntax tree takes place. 

The input language to treegen has provisions for 
specifying RULEs for tree transformations. From these rules for 
transformat ions, Treegen generates routines which apply these 
transformations on the input tree whenever called. 

A sample of the RULE specifications for tree 

transformation is given below- 
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RULE 

r ulel : 

r ule2: 


rulel 


eKpr_l ist (son 1: VAR, 5on2: VAR) => 

subscr ipt_expr_l ist Csanl , son2) . 

ar i th __loQp_cantr ol (sonl : VAR, sDn2: VAR, son3: VAR, 
5an4:VAR, SDn5:VAR) => 

{ 

ifC$5on5 ==NULL) { 

make leaf C IIMTNUM , 0, " 1 " ) ; 

(Tiakenode Cpr imar y ,0) ; 

!^son5 = popO; 


c_f or _loop ( initialiser (sanZ, son3) , 

tHrmcheck(5on4, son2, son5) , step Csan2, sonS) ) . 


What the rules mean: 

: rulel says that the node e.yFJ"_J ist two of whose sons 

have labels sonl and son2 respectively, should be 
replaced by a new node of type suhscri pt_expr_J ist. 
Further, the new node suhscr 2 pt_exPr_list must have 
the same two children as those of the old node 
expr__Iist. In F77 both array reference and function 
reference are syntact ical 1 y identical and the 
grammar for the parser cannot distinguish between 
them. In C, array and function references have 
different syntax. To distinguish between the two. 
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in the case 
following the 
different type 
list using the 


of an 
array 
of nod 
above 


array the expression list 
name is transformed into a 
e called subscript expression 
rule. 


rule2: The rule2 is more complicated. It has some C code 

inserted before the target node specification. 
This code is executed before applying the specified 
transformation. 

In this rule the arithmetic loop control in 
F77 used for controlling the execution of a DO loop 
is being converted to a semantically equivalent loop 
control specification for a for loop of C. The 

c_for'_looP has three children: the index 

initialising part - ivitial iserr the loop 
termination checking condition - terxcheck and the 
incrementing step - step. Jhese children of 

c_for_loop are functions of the children of the F77 
ar 2 th_loop_control and this information is conveyed 
by the use of label names son! son2 etc.. 

The routine generated by Treegfen 
•ansformat ion is; 

transformi NODE *ptr_to_rootof_tree>: 

ptr __to_r oot_of _tr ee is a pointer 
sferring to the root of the tree (or subtree) 


which does the 


to the pointer 
to be transformed. 
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The transformer routine traverses the tree in a depth 
first, left to right order looking for nodes to be transformed as 
specified by the rules. Whenever a pattern (node listed on the 
left hand side of a rule) is found it applies the appropriate 
transformat ion on it. Transformations on the tree are applied in 
a bottom up manner. The subtree once traversed is not traversed 
again normally. However it is possible to specify if needed, that 
a rematch for a pattern should be searched in a transformed 
subtree. 

3.6 Unparsing the tree 

I' To get the program in the 

C) , the transformed syntax tree has 
NODE specifications, one may give 
'different nodes. The unparsing 
following information; 

1. Print strings: Strings which must be copied into the 
output stream of the unparser at different points during 
the unparsing of the node. 

2. User functions: User defined functions which must be 
executed at various points during unparsing. The node 
being unparsed is passed as a parameter to these 

‘ri functions. The names of the functions used in this 

manner must be listed under the FUNCTION section in the 
specifications. 

C code; Any C code which must be executed at some point 
during the unparsing of a node. 


target language (in our case 
to be unparsed. Along with 
spec i f icat ions for unparsing 
specif icat ions contain the 


3 . 
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4. Which sans and in what order?: During unparsing of a 
othe^r Dade one may choose to .omit one or more sons of the 
node. That is,, the entire subtree following those sons 
are to be omitted. This information is incorporated by 
simply omitting the names of those sons in unparse 
spec i f icat ions. Also, the unparsing of subtrees under a 
node may be done in any order by listing the sons in the 
’! required order. 

The unparse specifications are entirely optional. One 
may omit these for any node. In such a case that subtree will not 
be unparsed during the unparsing of the tree. 

S Some examples of unparsing specifications are given 

below. 

1, Leaf unparsing; 


STEND :< > 

< doclose_pr int > 


This specification says that^ when the leaf node STEND is 
unparsed, the string should be printed first- Then the 
unparse string stored in the node STEND should be printed. 
Then a user defined function doclose _print should be executed 
with the node being unparsed as parameter. In this example 
doclose __przDt checks thein/o field of the node and prints the 
string "1" as rriany times as the value of ijjfo. This is 
necessary, to see that the non-block do-loops which may be 
terminated by a labeled action statement are closed 
correctly. 
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2. List unparsing: 


symname_l ist 



Clist: SYMNAMEl 

> 

y II It II ti fi H i» «» y 


This unparse specif icat ion for the list node 
svsinameji ist has four strings in it. They represent in order, 
the string to be printed - before unparsing the list node, 
after unparsing every son of the list, after 
completing unparsing of all sons and the string to be printed 

if the list has no sons. 

Unparsing of sVwvame Ji st produces a comma separated 
list of names. 


3. Other node unparsing: 


main_program : 

sonl: apt_comment_stmts , 

son2: C program_stmt_par t , NULL}, 

• son3: £ specif icat ian_part , NULL}, 
son4: £ execut ion_par t , NULL} , 
sonS: end_stmt_par t 

< sonl son2 £gen_other_defsCl ; } son4 sonS > 


This specification says that while 
_pro 9 ra 7 i) sonl and sor>2 must be unparsed after 


unparsing 
which the 
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user written C code within r and > must be executed. After 
that son4 and son5 should be unparsed in that order. We see 
that s-on.3 is omitted in the specification. This means that 
the son sori3 need not be unparsed. 

The function gen_olher'_dBfsC^ written as a part of C 
coda, genarates definitions for all tha variables and 
functions used in the main program. 

3.7 Formatting Unparser output 


The output of the 
program in a coded internal 
the C program this must be 
exclusively to be used with 


unparser represents the 
representation. To get th 
passed through a formatt 
the output of the unparser. 


target C 
e text of 
er written 


^ . Samples of the formatter 

output are given below. 


input and the corresponding 


Input ; 


24)24)23)23122)22)21 121)20)20) 19.) 19) IB) 
ftoc„7001: 12)12)10)10)10)forl0) (10)9)9) 

(dOllOllOlIMXSIZ) -9)9)11/10)10)1 > 0 


18)17)17)12)12) 12) 
j ^ 9)9)9)9)9)911; 10) 
; 10)9)9)1 


10)10)1)11)11) tiino)iuuuinino)io)io)DUMiiYio)ioKio) 
10)10)13110)10)10)10)10)10)10)3+10)10)10)10)10)10)3] -10) 

10) 10) 10) 10) 10)01 1 )•, 3 12) 17) 17)17)13) ‘3' , ^3, 

12)512) 12X12) 12) 12) (12M2) -12) 12) 12) 12)12) 12)5»12)d 

12) 12)6.5)»12)12)12) 12)0112) 12)12)12)12)12)12)5)+12) Pouan 

112)12)12)P,12)12)12)Q)))l'»tl4)13)l'l)l'''l''>l'*”3”^'® 

13) 13) 13) 13) A (13) 13) 13)13)13)J, 
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13) 13) 13)0 14) ; 


} 14) 16) 15) 15) else 16) { 16) 15) 16) 16) 16) 16) 15) 15)B 


=15) 15) 15) 15) Atl5) 15) 15) 15) 15) 15)J 15) 
15) 15) 15)D) 16) ; J 


15)15)15)15)15)1 


Output : 


ftoc _7001 : 

forCI = 1;((IMXSIZ) - D/l > 0 

DUMMYCi: [13+5: = O; 

if t ! (S< t -5s6. 5)«Q(5)+pawer CP, D) ) ) f 
B = A( J, C) ; 

1 else! 

B = ACJ-1, D); 

} 


g.8 Getting Semantics Right 

There are many issues in conversion of F77 to C where 
care must be taken to preserve the semantics of original program 
in the translated program. A few of them are discussed here. 

1. Arrays : Arrays in F77 have subscripts starting from I i 
not always, but the other ranges are not implemented yet in 
F2C). In C, array subscripts start with 0. In translationr 

i->vna +n h;:ive one more element than the 
the arrays are declared to have one 

a„a, in F77 program. The element with subscript O xs 

i.3nored. The other subscripts are maPPed one-to-ure. x e. . 
array element fl<30y of F77 Program would be mapped to 
■ 4n the C program. 


112289 
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2. Parameter passing: In F77, parameter passing is by 
reference always. When expressions or objects with no lvalue 
are to be passed, their rvalue is stored in a compiler 
generated temporary variable, and the address of that 
temporary variable is passed. In C, parameter passing is by 
value. One can obtain the effect of call by reference by 
explicitly passing a pointer to the object. In the case of 
arrays this won-'t be necessary, because array names are 
treated as pointers to the beginning of array. In 
translation, to maintain the F77 semantics, one needs to pass 
a painter even for those actual parameters which have no 
lvalue e.g. an expression, so that the called function can 
access the value by dereferencing the pointer received 
through formal pararrieter. This can be done by, simulating in 
C, what the F77 compiler does while passing parameters. 
Assign the expression to a temiporary variable generated by 
the converter. Then taV^e a pointer to that temporary and 
pass it to the function. This must be done only in the rase 
of objects with no lvalue. Array narries may be passed without 
any change. 

3- COMMON objects : Data objects declared in a F77 COMMON 
statement, will be converted into global data objects in C. 
All global data object definitions are to be collected in a 
single file and used as an include file. All the routines of 
an executable program cont aining ' refer ences to these COMMON 
objects must be translated together so that all the global 
definitions are in one place. 

The translation is done in such a way that an array 
ence A(expr) for an array which is a part of a common block 



28 


BLDCKl defined in a function subprogram VSUMZERD would be seen by 
the C compiler which compiles the target program as 
BLOCK! . VSUMZERD. ACexpr 1 . 

Apart from the issues discussed above, there are many 
other issues viz., variables in EQUIVALENCE statements, those in 
SAVE stateiTients which must be carefully considered while 
translating F77 to C. 

Some sample F77 programs and their translations using 
F2C are given in the appendix A. 



4. TOWARDS A BETTER F2C 


4.1 A Complete Converter 

Does F2G translate any and every FORTRAN?? program? 
Unfortunatel/r the answer is - No. 

F2C as it has been implemented now, does not accept the 
complete language FORTRAN??, Within the time available, the 
entire language of F?? could not be accommodated in the converter. 

To make F2C accept the comiplete language F?? the 
following features need to be added. 

1. Data object specifications: Borne of the data object 
spec i f icat ions have not been handled. These include, the 
COMMON and EQUIVALENCE objects, NAMELIST sets, INTRINSIC and 
EXTERNAL functions which may passed as parameters, and data 
initialisation through DATA statements. Also COMPLEX objects 
haven^'t been handled. BLOCKDATA subprograms which deal 
exclusively with specifications have not been taken care of. 
These features must be added. 

2. I/O statements: Many of the I/O handling statements 
viz, INQUIRE statement, BACKSPACE statement, ENDFILE 
statement, etc,, have not been implemented. Only READ and 
WRITE statements with limited provisions are implemented. 
Other file handling features also need to be added. 

3. Execution constructs: Computed goto Statement and 
Assigned goto Statements are not handled now. They must be 
accorrimodated , 
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-.2 Testing and Performance Measurement 
'esting: 

A converter needs to be tested in a manner similar to 
the testing of a compiler. To test a compiler a test suite of 
arograms are developed which are designed to test all the 
features of the source language. This test suite contains bath 
correct programs and incorrect programs, A converter is meant to 
translate correct source programs only. So to test a converter a 
test suite of correct programs must be developed. These are to be 
converted to the target language and the converted programs must 
be compiled and executed on the target platform to check whether 
they behave exactly the same way as the programs in the source 
language by using another test suite of input data sets. 

To test F2C a standard test suite was not available. 
Given the time constraints, developing an entire suite of programs 
for testing F2C could not have been accomplished. So a. different 
approach was followed. A few programs were written to test 
carefully as many features as possible. Then a random collection 
Of about fifty F77 programs was made to build a suite of programs 
ito test F2C. 

I 

The testing of F2C was done in several stages. 

1. Testing the suite of F77 programs: The suite of 

programs was compiled on a HP -9000 Series 850 machine using 
the HP-F77 compiler. All the programs were accepted by the 
compiler and no errors were reported by it. 

2. Lexical Analyser: The lexical analyser was first tested 
using the few specially written F77 programs and the debugger 
output was carefully read to see that the lexical analyser 
behaved as expected on the input. After that the other F77 
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programs were passed through the lexical analyser and bugs 
disccjvered at that stage were removed. 

3. Parser: Tlie same approacli that was used for the lexical 
analyser was used here, except that, the input F77 prograrrs 
now went thorough the lexical analyser developed earlier. 
Hie parser was debugged till it behaved as expected. Sage 
siiTplif ications have been nvade in the grajrniar, but they do 
ntJt affect giost of the progra/Tis- 

4. Tree Builder: Similar to the first three phases, the 
tree builder was tested on the suite of prograrre and for a 
few prcagraiTS the syntax tree built was ex-amined in detail to 
check correctness. 

5. Tree Transforirier and Urparser: llnese two were tested 
together on a few specially wj^itten prograrrs and ■'not' on the 
suite of prograrrs used for testing other stages because the 
features of F77 gentioned in Section 4.1 above, have been not 
been igplegented for these stages. 

Performance Measurement; 

To measure the speed of the converter , some large sanple 
F77 progragis were nvade and the the tige taken by the converter to 
translate each of them was determined on a Sun-3 system. A sagple 
of this inforgation is given below: 


Program name 

No. of lines 

• 

CPU time ^ec 
User code 

CPU time sec 
System code 

p r OQ . -P 


. 8. 1 

1.1 

in i t intrriod. f 


7. S 

1.3 

t r a n ■“ ^ f"i . f 

797 

8.7 

1.7 

Total 

901 

'24.6 

4.1 
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It can bee stjen that the aisproxirrate tirre fcr 
translating about 1000 lines is nearly 33 secs. So the converter 
speed may be considered to be about 2000 lines of code per minute 
on a Sun -3 system. 

4.3 Sizes of Sf>ex:if ications and Prograirs 

One of the main ideas of this thesis has been to 
generate prograrrts using language processing tools and reduce 
traditional programming. Tlie tsble below sJtows the sizes of 
specifications for various t<x)l5 and the sizes of programs 
generated by them. 


Tool 

Bpec i f i cat i orts 

no. O’? 1 ines 

C programs generated 
no, of lines 

Lex 

325 

3346 

Yacc 

2992 

3761 

T r e 6 g e n 

1530 

3452 

Other C 

494' 

494 

rout ines 


' 

Total 

5331 

11053 


It can be seen that there h-as been iTore than 30% saving 
in the size of the user written code. More than that, there is an 
enorfnous afrount of tirrie saved because of the ease with vbich the 
specifications can be debugged and modified to suit changes in 
design. 



32 


It can bee seen tliat the aiaproxiirate tirre for 
translating about 1000 lines is nearly 30 secs. So the converter 
speed may be considered to be about 2000 lines of code per minute 
on a Sun -3 system- 

4.3 Sizes of Specifications and Programs 

One of the main ideas of this tfiesis has been to 
generate progran-ts using language prottessing tools and reduce 
traditional progr arming. Tlie table below shows the sizes of 
specifications for various tools and the sizes of programs 
generated by them. 


Tool 

Bpeci f ications 
nn. of lines 

C programis generated 
no. of lines 

Lex 

325 

3346 

Yacc 

2982 

3761 

Treegcn 

1530 

3452 

Other C 

4^4 

494 

routines 


1 

Total 

5331 

11053 


It can be seen that there has been inore than 50X saving 
in the size of the user written code. More than that, there is an 
enortrous airount of tirre saved because of the ease with vtiich the 
specifications can be debugged and modified to suit changes in 
design. 
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4.4 From Program to Software Package 

A source code converter software package which can be 
given as a product to end users would need a much larger support 
than what could be provided by a converter like F2C(even if it 
were to accept the complete source language!. 

Any program when moved from one platform to another, is 
also being moved from one programming environment to another. All 
the components of the environment used or tacitly assumed by thei 
source program must also be considered while translating it and/or: 
moving it to a different platform. These components include,' 

fi! 

system calls, library functions, special graphics and CAD support? 
routines, etc. 

In such a situation the successful translation and 
porting of source program would need a series of converters? 

i;: 

linking source and target environments. One of these convertersl; 

' ^ 

would translate the system calls, the second would perhaps convert| 
all mathematical library routine calls to those available in 
target system, a third would take care of graphics and CADI 
routines etc., . The source language converter like F2C would| 
indeed be the last one in these series of converters which would| 
finally output programs which would be in the target language and? 
would make use of the program supporting environment available at? 
the target platform. 
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APPENDIX A 


C nM:TIC]N TO OOr-PUTE TI E VECTOR SUN OF A 1-D /WAY 

C CF IhmEEB^ AMD FIND IF IT IS ZERO. 

C RETIFNS .TRUE. IF VECTOR SUM IS ZERO ELSE RETURNS 

C .FALSE. 

LOGICAl. FU'CTICN '.^JNZERO(A,Nim_) 

C A - ARRAY NANS. NUTEL - NUMEER OF ELEMENTS IN ARRAY. 

INTEGER A, VSUM 

DIMENSION A( » ) 

7001 VSUN « 0 

DO 10 1=1, NUMSL 

VSLN = VSUM + A(I) 

10 CONTINUE 

IF( VSUN .BQ. 01 TIEN 
VSUr-IZERO = .TRUE. 

ELSE' 

VSUMZERO = .FALSE. 

END IF 

RETURN 

END 

Sairple FI: A F77 Function Subprogram 



/■■A FUvCTION TO COi^UTH ll-E VECTOR SUM OF A 1-D ARRAYw/ 

/w CF IFTIEBERS AFD FI^D IF IT IS ZERO.s/ 

RETU^E .IHJE. IF VECTCR 9JM IS ZERO ELSE RETURNS s/ 
/» .F/i.SE.»/ 

int VSUMZEROCA, NUM3_) 
int MJMEL ; 
int A C 3; 

{ 

/» A - ARRAY NAME. MJTEL - hO-ffiER OF ELETENTB IN ARRAY.w/ 

int I ; 

int VSUM f 

int ftoc_ret_val; 

ftoc_.7001: 

VSUM = 0; 

fcjrCI = 1; ((NUteU) - I)/l > 0 jl += 1){ 

VSUM = VSUM+ACI]; 

ftcx:_10: 

f 

} 

iftVSUM=OH 
ftoc__ret_val "1; 

} eli5e{ 

ftoc_ret_vaI =0; 

} 

return fttx:_ret_val; 

} 


San-ple Cl: 


SarrplG FI translated into C 




